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ABSTRACT 
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with significant improvement in student achievement* A fifth study, 
designed to test these findings^ failed to corroborate the positive 
results obtained previously. After a year-long study, 
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than the control-group teachers did. The classes of these two groups 
of teachers were not appreciably different in end-of-year student 
academic achievement. Training-related behavior among 
experimental-group teachers was not modified enough to effect 
appreciable changes in subsequent student achievement. Thase resalts, 
and the results of the previous four studies, are analyzed, and 
recommendations are made for further research. (JD) 
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Introduction 

Direct instruction represents a constellation of 
teacher behaviors and classroom characteristics — "a 
convergence of results" (Rosenshine & Berliner, 1978, 
p. 3) — that has been identified in the accumulation of 
process-product research. Further, the direct: instruc- 
tion model has been regarded as foreruost in explaining 
growth in conventionally measured achievement, espe- 
cially at the elementary-grade level (Berliner, 1979; 
Berliner & Rosenshine, 1977; Good, 1979b; Powell, 
1978). Powell (1978; also see Rosenshine, 1979, p. 38) 
offered perhaps the most succinct presentation of sev- 
eral key components of this model: 

The coverage of content is extensive, time is 
allocated to academic tasks, and the time is 
not broken by frequent interruptions or changes 
of task. Students spend a good portion of the 
time allocated to instruction actually engaged 
in instructional tasks, and the teacher moni- 
tors and encourages task engagement on the 
part of the students. . . .The atmosphere in 
the classroom is one in which academic work is 
both recognized to be important and performed, 
(p. 29) 

Kxperi i' ien ta l Rl r^on y(:h 

To date, four classroom-based experiments have been 
conducted that incorporated the direct instruction model: 
Anderson, Evertson, and Brophy (1979); Crawford, Gacje, 
Corno, Stayrook, Mitman, Schunk, Stallings, Baskin, 
Harvey, Austin, Cronin, and Newman (1978); Good and Grouws 
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(1979); and Stallings, Needcls, and Stayrook (1979), in 
each case, findings from previous correlational studies 
of process-product relationships were assembled ii^to clear, 
concise reading material for teachers. Further, random 
assignment was employed in assigning the classes or 
schools to experimental conditions. 

The training programs developed by Anderson ct al. 
( 1979), Good and Grouv;s (1979), and Stallings ot al. (1979) 
were based largely on process-product relationships reported 
in Brophy and Evertson (1979), Good and Grouws (1977), 
and Stallings, Corey, Fairweather, and Needels (1978), 
respectively. Anderson et al. (1979j acknowledged the 
additional influence of Blank (1973) and the Southwest 
Educational Development Laboratory (1973) . The training 
program developed by Crawford et al. (1978) involved 
the comprehensive examination and synthesis of the re- 
sults of four large-scale correlational studies (Brophy 
Sr Evertson, 1974; McDonald & Elias, 1976; Soar, 1973; 
Stallings & Kaskowitz, 1974) . 

Because they have been conducted in regular class- 
rooms, rather than specially contrived settings, these 
four experiments rate high in ecological validity. They 
have the realism of being concerned with teaching that 
has gone on over an extended time of several months or, 
more typically, the entire school year, rather than a 
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few hours, days, or weeks. ine teachers in these experi- 
ments have been practicing teachers, rather than student 
teachers or teachers specially selected and employed for 
the research project. And, because the teaching recom- 
mendations manipulated In these experiments were derived 
from studies of naturally occurring teaching behaviors, 
it was known in advance that this manipulation would 
call for no esoteric behavior that was alien to regv ar 
classrooms . 

Although for the most part relying on different data 
bases for the development of the training programs, these 
experiments have in common the theme of direct instruc 
tion. In addition to demonstrating positive change in 
training-related teaching practices of experimental- 
group teachers, each experiment resulted in greater gains 
in student achievement for experimental-group classes 
when compared to control-group classes. Each experiment 
is described hero. 

Anderson ot al . (1979) . This experiment was con- 
ducted in White, middle socioeconomic-status (SES) , first- 
grade classes. SchooJs were randomly assigned to treat- 
ments, after stratifying on school sixe and SES. The 
dependent measure was the total reading score on the 
Metropolitan AchioveiiUMit Teats; the total readiness score 
on the Metropolitan Readiness Tests served as a covariate. 
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Exporimontal-cjroup tcvichcrs r-coivod a manual prosonting 

an instructional model, which sot forth 22 principles. 

The treatment was minimal in cost and time. In 
October, the rosea rchors met with teachers in 
the treatment schools and described the purpose 
of the study. The teachers who agreed to par- 
ticipate read the manual describing the instruc- 
tional model and met again with the experi- 
menters to discuss it. There was no further 
training, and no attempts v/ere made during the 
year to boost the treatment. (p. 195) 

Observations were conducted in all contrcl-uroup 
classes and in 10 of the 17 experimen tal -group classes. 
Between November and May of the school year, each of 
these classes was observed on 15 to 20 occasions— roughly 
once a week. A specially constructed observation instru- 
ment was used that allowed the investigators to "measure 
implementation of the principles in the instructional 
model as well as other aspects of first-grade reading 
instruction that might l)o important in assessing students' 
achievement" (p. 198) . 

Because not all experimental-group classes were ob- 
served, the question could be addressed: Did the presence 
of observers moderate the effect of the treatment? That 
is, did experim<<ntnl-grouj; teachers wlio wore obserV(>d 
have classc-s witii f)reater achievem(?nL gains than the 
classes of the unobserved experimental-group teachers? 

A series of betweon-class regression equations 
were employed to assess treatment effects, covariato- 
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treatment interactions, and r;ervation effects (i.e., 
observed .^orsus unoi.^served experimental-group teachers). 
There -.vas a significant treatment effect (£ < .05): 
After the dependent variable was regressed on the co- 
variate, an additional 10^ of the variance was accounted 
for by entering the treatment >erm. From similar 
analyses, it was found that there was neither a statis- 
tically sianificant covariatc- trcatmen t interaction nor 
a statistically significant observation effect. The 
former finding indicated that, by conventional standards, 
homogeneity of regressions could be assumed; the latter 
indicated that the pre-cnce of classroom observers did 
not moderate the treatment effect on student achievement. 

(^3:£^^_^±J^:^ilJ)231' 'ioro, the context was 33 
n.iddle-SES, third-grade classes. Volunteer teachers, 
after their classes wore stratified on mean academic 
achievement, were randomly assigned to three experimental 
conditions: obsorvation only (N 10), minimal training 
plus observation (N = 11), or maximal training plus ob- 
servation ^ 12). Minimally trained teachers simply 
were mai Ir.d al. weekly intr^rvals a s-tios of five training 
packet:;, winch < .inljo<l i r-d 22 [;r i nc i 1 r-.s (only coincidcMi- 
tally tile same number of principles used in the experi- 
ment conducted by Anderson et al. [1979]). The maximally 
trained teachers, in addition to receiving the weekly 
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packc-ts, attended weekly meetings in the five-week 
period during which the training packets were delivered. 
These meetings were devoted to review and discussion, 
along with videotape viewing and role playing. 

As was noted above, all classes in the three groups 
were observed. The observations were performed for a 
total of approximately 16 full school days for each of 
the 33 teachers before, during, and after the training. 

Ostensibly, the two modes of training delivery dif- 
fered considerably with respect to the teacher engage- 
ment with the training material. Nevertheless, these 
two training conditions had equivalent effects on class 
achievement on a vocabulary posttest. And cogether, they 
were .69 of a standard deviation (SD) above the mean of 
the control-group classes (£ < .15), although there was 
no comparable effect on a reading comprehension post- 
test. Interestingly, minimally trained teachers were 
found to implement more of the training recommendations 
than the maximally trained teachers. This difference, 
however, may be partly artif actual. The minimally trained 
teachers were initially higher than the maximally trained 
teachers on a measure of verbal fluency anc' a measure of 
of structurednoss, both of which correlated positively 
with implementation. However, a difference in implerren- 
tation— albeit a small one--rcmained after adjusting for 
these initial differences. 
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Good and Grouv;s (19 79) . i-orty lo'.vcr-SES, fourth- 
qrade classes served as the context for this experir.-.ent . 
Tiic dependent variable wis performance on the mathe- 
niatics subtest of a standardized achicveir.en t test adminis- 
tered in mid -December; the same test administered in 
Scptemi.er served as a covariate. The training procedures 
were similar to those reported by Anderson et al. (1979). 

introductory meeting was held in September for an .iQ 
volunteer teachers and their principals. At this meeting, 
the general nature of the study was outlined and, sub- 
sequently, schools were randomly assigned to treatments. 
The researchers then described the instructional model 
to the 21 exiK-rimental-group teachers for approximately 
90 minutt'S, and tiie 45-page manual was distributed. 
T-vo weeks after treatment began, an additional 90-minute 
meeting was held to answer questions about the program. 
Almost all of the teachers were observed on six occasions 
between October and the end of January. 

A class-level analysis of variance on residualized 
gain scores indicated a treatment effect (p - .01) favor- 
ing tile experimental-group classes. Good (1979b) later 
rr-portod that tin? experimental-group classes still held 
an advantacie at the .-nd of the scliool y.^ar when the dis- 
trict carried out its regular testing--roughly three 
months after formal observations wer(; completed. That 
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a treatment effect was detected early in the school year 
is noteworthy and, indeed, encouraging. Further, the per- 
sistence of this effect for three months after classroom 
observations were discontinueJ might be regarded tenta- 
tively as evidence of the stability of the treatment 
effect . 

The findings of Crawford et al. (1978) regarding the 
minimally versus maximally trained teachers, the absence 
of an observation effect reported by Anderson et al. 
(1979), and the results of Good and Grouws (1979) led 
Good (1979b) to the following conclusion: 

Although mo-e research on implementation is needed 
tv;o tentative conclusions are warranted: (a) elab- 
orate delivery systems may not Ij necessary for 
effectively training inservice teachers to perform 
specifically identified classroom behaviors, and 
(b) obsorv-rjLion of teachers does not necessarily 
have to be a part of the inservice training. (p. i:?) 

Stallings et al._(J^79)_. This experiment differed 
from the other three in two critical respects. First, 
the conte-t was junior and senior high school classes, 
rather than cleinen<-ary-gr. de classes. Second, and per- 
haps more important, the training was accomplished through 
comparatively intensive workshops. Despite these differ- 
ences, the study nevertheless is relevant to the present 
discussion in that, like the experiments discussed above, 
direct-instruction findings from previous correlational 
process-product research were put to experimental test 
in regular classes. 
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Volunteer tc-achers of 22 junior and 24 senior high 
school classes wore randoialy assigned to a training or 
no-training condition. Students represented a broad 
range of ooth ethnicity and locale. Each class was 
observed for three consecutive days in the fall, winter, 
and spring. 

Four two-hour workshops wore held for the 22 experi- 
mental-group teachc'-s after the fall observations had 
been completed. In addition to extensive discussion and 
role playing pertaining to "the direct approach to teach- 
ing" (p. 6.10), observation-based feedback and recommen- 
dations were provided for each trained teacher. An addi- 
tional two-hour workshop was held after the winter obser- 
vations had been complotea. Finally, a toachor-roquestcd 
meeting was held in April for the experimental-group 
teachers from all districts. Tho m-eting, which lasted 
a full day, provided teachers the opportunity to exchange 
information . 

The dependent variable was gain on the Comprehensive 
Tests of Basic Skills (CTBS) from the end of one year to 
the end of the next. (Complete CTBS data were available 
for the classes of 14 control-group teachers and 15 
experinicntal-gioup teachers.) The authors reported a 
standardized mean-difference of .^2 SD in favor of the 
experimental-group classes. When calculated as recommended 
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by Hedges (1980)-^ t>ie standardized mean-difference is 
.13 SD — still, an encouraging value. Although no other 
analyses bearing on treatment effects were reported, one 
can compute a t ratio using the mean gains and standard 
deviations provided (Sta"'lings et al . , 1979, Table 31). 
The resulting value is 1.19 (p < .15). 

A note on imolemcnta tion . In these four experi- 
ments, any treatment effect on student achievement clearly 
is mediated by the extent to which the teachers imple- 
mented the various instructional programs. That is, treat- 
ment implementation is a necessary condition for subse- 
quent treatment effects on student achievement. Regard- 
less of now thoroughly a teacher may have read the fur- 
nished materials, such treatment effects cannot be expec- 
ted where teaching processes have remained practically 
unaltered. As Charters and Jones (1979) pointed out in . 
the concept of program evaluation, "it is the use of new 
instructional packages. . . .that constitutes an inno- 
vation, not the mere presence of tho packages in the 
classroom" (p. 6, emphasis in original). 



Hedges (1980) argued that a standardized mean-differ- 
ence, cr "effect size," is best computed by subtracting 
the control-group me.-n from the experimon t.il-group 
mean dividing the difference b'y the pooled within-group 
standard deviation, and multiplying by a correction 
factor based on the degrees of freedom represented in 
the denominator. 
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Assume that a given saniple of teachers is reasonably 
motivated to conform to teaching recommendations — not an 
implausible assumption with volunteer teachers (e.g., 
Rosenthal & Rosnow, 1975) . Then the question can be 
asked, What influences actual implementation? is there 
inter-recommendation variation with respect to implemen- 
tation? 

The results on implementation from these four experi- 
ments lend support to what Doyle. and Ponder (1977) called 
"the practicality ethic." In their discussion of teachers' 
reactions to change proposals regarding instructional 
practices, Doyle and Ponder (1977) held that "the study 
of the practicality ethic is the study of perceived 
attributes of messages and the way in which these percep- 
tions determine the extent to which teachers will attempt 
to modify classroom practices" (p. 2) . 

A judgment concerning this ethic is shaped by three 
criteria: ins trum.en tali ty , or the extent to which the 
change proposal is stated clearly and with "procedural 
specifications"; congruence , or the extent to which the 
proposal "is congruent with perceptions of [the teachers'] 
own situations"; and cost, or "the ease with which a pro- 
cedure can be implemented and the potential return" 
(pp. 7-8) . 

Anderson et al. (1979) reported that successfully 
implemented recommendations tended to bo those that 
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described specific skills and focused on familiar, though 
not necessarily relied upon, behaviors. Similarly, Good 
and Grouws (1979) found that behaviors involving specific 
requests were more successfully implemented. And Crawford 
et al. (1978) reported that the less successfully imple- 
mented recommendations tended to be more global and non- . 
specific. These findings seem to reflect operation of the 
instrumentality criterion. 

When Stallings et al. (1979) concluded that "it is 
difficult to get teachers to try something they have 
opinions against" (p. 7.4), it would seem that these authors 
were addressing the congruence criterion. Similarly, 
Ebmeier and Good (1979), having analyzed further the data 
from the study conducted by Good and Grouws (1979), re- 
ported that teachers who already "believed" in an instruc- 
tional model like the one being introduced through the 
training were characterized by greater implementation. 
Anderson et al. (1979) found that successfully implemented 
behaviors "had a rationale based on other classroom pro- 
cesses or student outcomes that made sense to teachers" 
(p. 219). It could be argued that this rationale pro- 
vided congruency for the teachers. And to the extent that 
such a rationale furnishes information regarding the poten- 
tial return of implementation vis-a-vis student outcomes, 
the teachers perhaps are able to make a judgment concerning 
the cost criterion. 
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Summa ry. in contrast to cor ro la t.i onal process- 
product research, the results of these four experiments 
allow tentative statements about causality , rather than 
more association . This experimental research indicates 
that ti.aining teachers to adopt a direct-instruction ap- 
proach to teaching can result in positive change, when 
compared to untrained teachers, in both training-related 
teacher behavior and student achievement. Further, the 
findings of Anderson et al. (1979), Crawford et al . (1978), 
and Good and Grouws (1979) have suggested that such change 
does not necessarily require extensive investment on the 
part of the researcher — that positive results can be ob- 
tained through a minimal intervention. Referring to these 
three experiments. Good (1979a) held that "these studies 
illustrate that teachers can be taught direct instructional 
.principles in relatively simple training programs that 
lead to changes in teachers' classroom behavior and stu- 
dent achievement" (p. 9) . 
The Role of the Present Study 

The present study constituted a minimal intervention. 
Unlike the four experiments discussed above, this inter- 
vention was minimal in that (a) the treatment consisted 
solely of mciiling training materials to the experimental- 
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group teachers and (b) only a limited number of brief 
classroom observations were conducted. 

As will be discussed in greater detail below, the 
teacher training in the present study took the form of 
the "minimal training" condition in Crawford et al., (1978). 
The latter study, however, involved frequent and lengthy 
classroom observations. And although Anderson et al. 
(1979) explicitly addressed the question of classroom 
observations as a moderator of treatment effects on 
student achievement, part of the training in that study 
involved attending two meetings with project staff to 
discuss the project and training materials. Good and 
Grouws (1979) provided a total of ten hours of work- 
shops during the training period and conducted extensive 
observations, as well. 

Thus, although encouraging claims have been made 
concerning the feasibility of a minimal intervention 
(e.g.. Good, 1979a, 1979b; Good and Grouws, 1979), such 
an intervention had not yet been undertaken. That is, 
no intervention had been minimal with respect to both 
the delivery of training and the conduct of classroom 
observations. Tlio pro.sont study was such an intervention. 

Sample 

The initial sample comprised 33 volunteer teachers 
and their fourth-, fiftli-, and sixth-grade students in 
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a largo, urban school dlstrirl. in the San Francisco 
Bay Area. Because of subsoquont complications, the 
number of classes on which the achievement-data analyses 
were based was reduced to 28. There w« re 966 students 
in the 28 classes, 631 of wlioni had both pretest and 
posttest achievement scores. All analyses bearing on 
academic achievement were based on these 631 students. 

Most of the students (85^o) in the sample were either 
Black {G5%) or Caucasian (205). As for occupational 
status, roughly GTh of the parents had "skilled" occu- 
pations or lower, -./ith nearly one in five being unemployed 
or receiving AFDC. Approximately 20% had occupations 
rated as "professional" or "white collar." (See Coladarci 
[1980] for description of the occupational-rating in- 
s trumen t . ) 

Instruments and Proced ures 

The instruments in this study were the teacher edu- 
cation packets, the classroom observation schedule, and 
the Comprehensive Tests of Basic Skills. 
The Teacher Educ ati on Packets 

The teacher education packets (Tr;P) were developed 
and u.'WMi by cr.iwrord <■( al. (1978) in I Ii.m r.Ludy. A:: 
mentioned above, the TKP contained recommendations for 
teaching that wore based on the large-scale correlational 
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studies conducted by Brophy and Evertson (1974), 
McDonald and Elias (1976), Soar (1973), and Stallings 
and Kaskowitz (1974). The thousands of process-product 
correlations presented in the technical reports of 
these studies were examined and considerec as the 
basis for prescriptive statements (see Crawford et ai., 
1978, Vol. I, pp. 25-31). There were several requirements 
for a particular process-product correlation coefficient: 

(a) The product variable had to be roadincj achievement; 

(b) the correlation coefficient had to be statistically 
significant (£ .05); and (c) the process variable had 
to be operationally defined. 

Information sheets were prepared for each process- 
product correlation satisfying these conditions (see 
Crawford et al. , 1978, Vol. I, Appendix A). Each sheet 
reported the process variable's operational definition, 
moan, standard deviation, and metric, along with the 
process-product correlation coefficient and interpre- 
ta.tion of this coefficient. The operational definition 
of a variable was important in assessing the comparability 
of its meaning acros^s studies. And the moan and standard 
deviation of a variable were necessary in estimating the 
desirable level of the variable in ''/racticc. Where a 
variable correlated positively with reading achievement. 
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the dosirablG IgvgI was sot aL ono standard deviation 
above its moan. Conversely, the desirable level was 
set at one standard deviation below its mean for any 
variable that correlated negatively with reading achieve- 
men t . 

Tlio interpretations of the 125 qualifying correla- 
tion coefficients provided the basis for three packets 
of teaching reconmendations , each packet corresponding 
to a general area of teaching: behavior management and 
classroonj discipline, instructional methods, and ques- 
tioning and feedback. Table 1 presents the number, and 
source, of variables represented in each of these three 
categories. The contcMits of each packet will bo briefly 
discussed hc'.'o. 

B ehavicr management and classroom discipline . • This 
packet is based on the findings that classes characterized 
by a general unruliness and a poorly articulated system 
of rules are also characterized by frequent nonengage- 
mcnt in academic activities and student difficulty in 
attending to academic tasks. Teachers are informed of 
ways to manage their classes, largely in the light of 
PCounin (1970; also see Hroi^hy & Putnam, 1979). 

The packet cautions teachers against disciplinary 
errors that prolong or compound the lJrohlem--speci f ically. 
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The Number and Source of Variables for liacli of 
llie Teacher Education Packets' Categories 



Category 


Number of Variables 


S Ludy 


liclli'ivlor M;in;i t> f >inon (' 'nwl 

A' w 1 1 1^ ▼ A V &. i 111 1 Icl U-IIIL: 1 1 L t..lll(J 

Classroom Discipline 


6 


lirophy and livcrcson 




3 


Stiillings and Kaskowitz 




2 


McDonald and Elias 




2 


Soar 


Instructional Methods 


16 


Slnllin^s and Kaskowitz 




11 


McDonald and lilias 






Soar 




6 


lirophy and iivertson 


Quest ioiiiny and 
Feedback Strate(;leu 


50 


Brophy and livertson 




17 


Stallings and Kaskowitz 




4 


McDonald and Elias 




2 


Soar 



Source: Crawford & Stallings (1978) 
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the disciplinary errors regardiiuj "timing" and 'target." 
Further, this pu .kot encourages teachers to develop a 
"system of rules," which lets students know--without 
always having to consult the teacher--wha t they can 
and should do during a given period. Finally, to curb 
misbehavior as well as to identify and respond to stu- 
dents in need v-^f assistance, teachers are encouraged 
to monitor activities when students are engaged in seat- 
work . 

In short, teachers are encouraged to develop what 
Kounin (1970) called "wi th i tness " ; through monitoring 
and vigilance, teacher.s develop a keen awareness of their 
class--who i.s and is not academically engaged, who needs 
assistance, wlio is misbehaving (indeed, who is about to 
misbehave) , and so on. 

Instructional methods . Tliis packet highlights tlie 
importance of large-group instruction, frequent use of 
questicn-and-ansvx/er sessions, and use of visual aids 
a;ad phonics exercises in reading activities. Additionally, 
with seatwork assignments, this packet informs teachers 
of the importance of as-signing work of appropriate diffi- 
cu^:.,-, using tcxtl:ooks and workbooks (rather than games, 
t')vr,, unc ;ii.i<;h i lie; ) , aiiuJ m i rn i iii i /. ing Ihrougli prr;,K;tivf; 
planning .ho amount of time devoted to organizing and 
giving directions. 
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Quc igLioninq and foodbaci ; •. tratocj ios . This packet 
pertains to the inaiinor in which tho teacher selects stu- 
dents to respond to questions, tho difficulty level of 
the questions asked, and the provision of feedback sub- 
sequent to tho student's response. A summary listinq of 
TEP recommendations is presented in Table 2. 

Ail introductory packet briefly discussed the TEP's 
rationale and provided a classroom vignette illustrating 
a teacher whose practices largely conformed to the TEP 
recommendations. A fifth packet reviewed and summarized 
the preceding packets. A sixth packet presented an 
additional classroom vignette, illustrating teaching 
practices that were both consistent and inconsistent 
with the TEP recomimendations. To measure knowledge 
obtained, teachers were asked to respond to 24 different 
scenarios in terms of the extant to which the particular 
sequence of events conformed r.o the TEP recommendations. 
The six packets individually /ore mailed to experimental- 
group teachers in December anc:l January - 

Finally, the teachers rcoeived three refresher sheets 
corresponding to the second, third, and fourth packets, 
respectively. These sheets wore intondc^d to provide a 
succinct and acc.-ssiblc review,; of the contents of tho three 
packets and were mailed, one per week, over a three-week 
period beginning in mid-February. 
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Summary of TEP Rccommendatious 
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Additional quizzes, goiioral questions, and rating 
forms were included in the TEP , as well. Covering the 
main points and recommendations in the respective packets, 
the quizzes had either a multiple-choice or sentence- 
completion format. Answer keys wore provided for all 
but one of the quizzes, the exception being a final, 
comprehensive quiz. General questions were structured 
in an open-questioned format and covered the teachers' 
TEP-rolatod opinions, attitudes, and practices. The 
rating forms called upon the teachers to' estimate the 
frequencies in their clafses of various activities and 
events that were discussed in the packets. Although not 
included here, completed analyses of these data are dis- 
cussed in Gage and Coladarci (1980) and Mohlman, Coladarci, 
and Gage (1980) . 
Classroom Observation Schedule 

This instrument, adapted from a measure used by 
Crawford et al. (1978), is a 4-page record of observer 
judgments and estimates on both low-inference and high- 
inference variables. For example, it contains items 
pertaining to the number of times the "teacher teaches 
groups of 8 or more pupils at a time" and "teacher calls 
pupil by name before asking question" (low inference), 
as well as items regarding the degree to which there exists 
"effective use of system of rules by teacher" and "communica- 
tion of awareness to pupils by teacher" (high inference) . 
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Each of the 26 items in the observation record 
reflects components of the TL'P (see Table 3). The 
alternatives for each item in the observation schedule 
were scored so the highest value represented the highest 
degree of conformity to the recommendations for that 
item and the lowest value represented the lowest degree 
of conformity. Thus, the observation records yielded 
a rough estimate of the extent to which teaching practices— of 
both experimental- and control-group teachers, before and 
after training--ro'-lected the TEP recommendations. 

Er.ch teacher was observed on four two-hour occasions 

twice in the fall and twice in the spring. Thus, in the 
present design, teachers and occasions were crossed. Each 
observer, however, did not observe all teachers; hence, 
observers were nested within teachers. Further, because 
not all observers observed on each of the four occasions, 
observers similarly were nested within occasions. The 
observation design is represented by the schematic in 
Figure 1. 

The Comprehensive Tests of Basic Skills 

The Comprehensive Tests of Basic Skills (CTBS), a 
nationally standardized test of academic achievement, 
served as the dependent measure. In 1976, the school 
district's committee on test selection chose the CTBS 
for regular use in the district. of five standardized 
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Summary discing of Categories of the Classroom Observation Record 



1 

2, 
}. 
4. 
$. 
6. 
7, 

fl. 

9. 
10. 
11. 
12. 
11. 
U. 



TirS't Error* 
Ttnlpg Error* 
Ovf rrf «rt lon« by tr«rhrr 

Efrtctlvj u^fc of aytcrs of ruW» by ceachtr 

l*4chtr «w«rfDfst of beh*vlor prcble£.s 

Coarunlcatlon of Awareness co pupil* by teacher 

«) Teacher wrote dally achrriule on chalkboard 

b) Te.icher t^dc use of written achejJule 

Tetrt.iT calU pupil ';y na-re before ifiklng question 

T*acher accepts call-outa during quest lon-arsver ■easioai 

Teacfer rncouragei pupil- !nl t Uted s'lestlor. and conacnti 

Length of teachec-a public feeiback In reading g-oup 

Teacher oonltori pupils' Individual and arall-group work 

Teacher teaches group* ©f 6 or sore puplla at a tine 

Teacher uae« vUual demons t rat loni In teaching reading 
or other acadcAlc aubjecta 



15. Teacher ui;s phonlca cxerclaei In teaching reading 

16, Studeata use cejttbookj. vorkLookj. p«per>peocll 
artlvlties, etc. 

17. 
18. 



19. 



31. 
22. 



Teacher*a asount of dlrectlon-glvinf and organltlrg 

PoaltlcnlnR of reading or t-ith group: puplla* 
bacVs to>.ard the rest of clisi 

Total tire In private, personal oattera with one 
child at « tlc»' 

«) NiPbtr of elnea tt,e pjpUa* aCiJcK.Ic anavera 

vore p^ittly correct 

b) '."yun there U a partly correct ansver, estUate 
iht n.xber of ure* the teacher vent head #nd 
f*^e the right anjrwer 

Teacher's task orientation towards the defined task 

Teachrr's positlvt affect toward one or core 

chl Idren 



23. Teachtr*a negative affect toward one or »ore 

chl Idrcn 

24. Attention of students 

25. Xolae level of claasrooi 
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achievement tests that were evaluated for district use, 
this test was judged as providing the best match between 
test content and local curricula. 

Test scores from the spring 1978 and spring 1979 
administrations were recorded from computer printouts 
provided by the school district's research department. 
The 1978 data and the 1979 data served as the pretc-ot 
and posttest, respectively. Each student's test scores 
were combined to yield a reading total score and a 
mathematics total score which, in turn, were combined 
as a total score. 

Assigning Teachers to Experimental Conditions 

Teachers were assigned to the control group or experi- 
mental group in the following manner: (a) Grade-equiva- 
lent means on the CTBS pretest were computed for each 
class; (b) scatterplots^ were made, displaying the joint 
distribution of CTBS means and fall conf ormity-to- 
recommendations means; (c) teachers were paired in each 
scatterplot according to proximity; and (e) at a toss of 
a coin, one teacher in the pair was assigned to the 
experimental group, and the other teacher to the control 
group . 



These scatterplots were made separately for fourth-grade 
classes, fifth-grade classes, and fourth-fifth combina- 
tion cl assess. The decision to add sixth-grade students to 
the sample was made after teachers were assigned to the 
experimental conditions. 
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Expcrimcn tal-qroup tcachcrr> v;erG asked to become 
familiar with the contents of the TEP and, further, to 
follov; the various rccOiumcndations in their teaching. 
Additionally, these teachers v/ere asked to complete the 
quizzes, open-ended questions, and rating forms associated 
with each of the packets, returning their responses by 
mail. Project staff did not meet with any teachers 
to discuss the training materials or to encourage imple- 
men ta tion . 

Results : Implementation of Training 
The statistical analyses focused on three main ques- 
tions. The first question concerns implementation of 
the training recommendations: Did the intervention appre- 
ciably alter the training-related teaching {practices of 
experimental-group teachers? The second question concerns 
treatment effects on student achievement: Did the inter- 
vention produce significant increments in academic achieve- 
ment for the students in the experimental-group classes? 
The third question addresses the study as a correlational, 
or i^rocess-produc t , one: Irrespective of experimental 
condition, v as there a positive relationship between 
teachers* con form i ty- to-rocommendat i ons and student achieve- 
ment? This section presents the analyses and results 
associated with tiie cjuestion of treatment implementation. 
The second and third questions are covered in the following 
section s . 
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As was noted above, each of the four observations 
was coded to yield a total score representing a teacher's 
general conf orni ty-to-recominendations (CTR) . The two 
fall CTRs were averaged for each teacher, as were the 
two spring CTRs. It was the CTR total, rather than 
the CTR item, that was emphasized in the analyses of the 
observation data. (For analyses of item data, see 
Coladarci [1980].) Analyses that employed the CTR total 
were considered more meaningful for several reasons. 
First, this study, as will be recalled, represented 
a minimal intervention. Here, one of the defining charac- 
teristics of such an intervention was the limited number 
of brief classroom observations. To f)rof)Ose a compre- 
hensive item-level analysis of the observation data seems 
inconsistent with the stated purpose, as well as with the 
design, of the study. Second, as Crawford and Stallings 
( 1978) pointed out, the raost compelling and defensible 
analysis is one of the program as a whole (i.e., total 
CTR) simply because the discrete teaching recommendations 
were not independently manipulated. While analyses that 
focus on the discrete teacher behaviors may prove intrigu- 
incj, the Inevitable} i n trTcorre] at i on aiiioruj those behaviors 
renders problematic any clear and meaningful interpretation. 
And, third, as the sum of n positively correlated items 
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has greater reliability than each item considered indi- 
vidually, larger and hence more meaningful differences 
are rr.ore likely to be found with the CTR total. 

The first task was to explore the reliability- 
stability of the CTR. Then, group differences on CTR 
were examined to assess treatment implementation. 
Reliability-stability of CTR 

The correlation between the two fall observations, 
the two spring observations, or the mean fall and mean 
spring observation represents at once (a) the reliability 
of the observers and (b) the stability of the teachers' 
behavior. That is, (a) what v/ould be the agreement be- 
tween the observations of two individuals if they had 
observed the same teacher on the same occasion? And 
(b) how similar would a teacher's observed behavior be 
over two occasions if observed by the same individual? 
V/ith the observation design of the present study (see 
Figure 1), those two sources of variance are com:, leteiy 
confounded . 

Given the importance of establishing the reliability- 
stah.Uity of the CTR and, further, because it is somewhat 
independent of the question of treatment effects on stu- 
dent achievement, analyses assessing CTR reliability- 
stability were conducted on the original sample of 33 
teachers as well as on the final sampTe of 28 teachers 
(i.e., for whpm adequate CTDS data were available.) 
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Table 4 presents, for the full sample, the means, 
standard deviations, and in tercorrela tions for total CTR 
corresponding to each of the four observation occasions. 
The two fail CTRs are moderately correlated (r = .27), 
as are the two spring CTRs (r = .27), while the remaining 
correlations are much smaller and changing in sign. 

The reliability of the sum of the two fall CTRs and 
of the two spring CTRs can be estimated by applying the 
Spearman-Brov.n formula. The estimated reliability of the 
fall sum (hereafter, "fall total CTR") becomes .43, v/hich 
is the same estimate for the spring sum (hereafter, 
"spring total CTR"). These estimates are equivalent to 
generalizability coefficients (Cronbach, Closer, Nanda , 
St Rajaratnam, 1972) . As such, they represent the ratio 
of between-teacher variance to the total observed-score 
variance — the latter comprising both between-teacher vari- 
ance and the nested combination of variance attributable 
to interactions involving teachers, occasions, and 
observers (see Figure 1) . 

The fall total CTR and spring total CTR are virtually 
uncorrelated (r = -.01). Because this correlation was 
calculated with experimental conditions pooled, the zero 
correlation might suggest the influence of the training 
on the experimental-group teachers. The fall-spring 
correlations for control-group and experimental-group 
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Conlormity-to-RccouiuiendaLions (CTK) Total .Scores: 
Means, Standard Deviations, and intercorroiations, 
Lixperiiueatal Conditions Pooled 
(N = 33) 



M 



SD 



N = 3Z, 



i 


i'all 


1 


70.5 


10.0 


.27 -.ly 


-.07 


2 


Fall 


2 


72.6 


8.4 


.07 


.23 


3 


Sprinj; 


1 


71.5 


9.0 




.27 


4 


Spring 


2^ 


67.8 


8.2 
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Conformity-to-UcccmmcndaLions (CTK) Total Scores: 
Means. SLandard Deviations, and Jalcrcorrelations, 
Experimental Conditions Pooled 
(N = 28) 



SD 



£ < .10 
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Kail 


1 


71.8 


y.y 


.18 -.36* -.oy 


2 


Fall 


2 


73.1 


8.4 


.03 .21 


3 


Sprinj; 


1 


72.0 


8.4 


.16 


4 


Spriuj; 


2 


67./. 


8.2 
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teachers arc .21 and -.14, respectively. 

Although the fall-spring correlation is essentially 
zero, the generalizability coefficient representing CTR 
scores across four occasions is .31. Because the former 
is based on two occasions wherc<-s the latter is based on 
four occasions, it is not surprising that the latter is 
larger. The corresponding within-group ge eralizability 
coefficients are .47 and .16 for the control group and 
experir,iental group, respectively. Again, the difference 
between these two values suggests some influence of the 
train ing . 

CTR means, standard deviations, and intercorrelations 
for the restricted sample are presented in Table 5. With 
the reduction in sample size, there is a concomitant 
reduction in the magnitude of the correlations between the 
two fall CTR- and, similarly, between the two spring CTKs. 
The former is reduced from .27 to .18, while the lat^er 
from .27 to .21. With the Spearman-Brown formula applied, 
the estimated reliabilities of the fall total CTR and 
spring total CTR are .31 and .28, respectively; these 
correlations are disappointingly low. 

Additional information concerning reliability is 
obtained from the alpha coefficient (e.o., Cronbach, 
1970), a measure of internal consistency. Table 6 pre- 
sents the alpha coefficients associated with each of 



ERIC ^ 



Table 6 



Conformity'-to-Reconiinendations (CTR) Total Sco 
Alpha Coefficients by Observation Occasion, 
For Full and Restricted Samples 



Occasion 


Full Sample 
(N = 33> 


Restricted Sample 
(N = 28) 


Fall 1 


.76 


.76 


Fall 2 


.67 


.66 


Spring 1 


.69 


.68 


Spring 2 


.69 


.70 



Note: In calculating the alpha coefficients, missing it 
data were replaced with the item mean for the particular 
observation occasion. 
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the four observation occasiuns. Ranging from .66 to .76, 
these coefficients reflect respectable degrees of internal 
consistency . 

Su:rjnary . When based on the full sample, total CTR 
evidences .T.oderate reliability-stability both across the 
tv;o fall occasions and across the tv;o spring occasions, 
indeed, when one considers the context in which reliability- 
stability v;as determined— the nature of the classroom 
observation instrument, the infrequency and brevity of 
classroom observations, the inherent variability of 
teacher behavior— the obtained correlations are almost 
impressive. When based on the restricted sample, however, 
these correlations are reduced substantially; the corre- 
lations C'-^parontly could not withstand a further reduc- 
tion in size o*^ an initially small sample. Finally, 
wnether based en the full or restrict.^d sample, within- 
occa.sion measures of internal consistency are relatively 
high . 

Group Differ e n ces^ oji CTH 

Because classes wore randomly assigned to the experi- 
mental conditions, the difference in fall CTR between the 
control group and experimental group was expected to be 
practically negligible. Inasmuch as classes were randomly 
assigned to experimental conditions and, further, pro- 
training and posttraining observations were conducted. 
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the suitability of the analysis of covariance (ancova) was 
initially entertained. The utility of ancova in such a 
design would lie more in the consequent reduction of 
error variance than in the posttest adjustment for initial 
differences on the pretest (e.g.. Linn & Slinde, 1977). 
Reducing the error tein, of course, results in a more 
sensitive statistical test; but, because this reduction 
increases as the magnitude of the pretest-pos ttest corre- 
lation increases, ancova is of little use where this 
correlation is less than approximately .40 (Elashoff, 
1969) . Such a correlation was not expected (and, ulti- 
mately, not obtained) between the fall and spring CTR 
and, consequently, the use of ancova was considered unwar- 
ranted. Rather, t ratios were computed for the difference 
between the spring CTR mec.ns- 

Table 7 presents r.he means and standard deviations 
of the. CTR totals, by experimental condition- Although 
favoring the experimental group, the spring difference in 
CTR between the two groups is not statistically signifi- 
cant. These data indicate that, as a whole, treatment 
implementation was poor: Training-related teaching prac- 
tices of the experimcnl-al-group teachers were not altered 
appreciably . 

The moan differences presented in Table 7, however, 
can be examined at a descriptive level. For the full 
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CoiUormi(.y-co-KL-co.:iinc-ndations (CTR) : Wi Lhin-Group Means a-icl 
Sca..dard Deviations for lUc Fall, Sprinji. and Full-Year Observations 
for Llie Full Sample (N = 33) and tlie Restricted Sample (.N=28) 







Full 


Sample 






CTR 


Control 
Oi = 16) 


ExperiiaenLal 
(N = 17) 




M 


SU 


M 


SU 


t^ 


Fall 


72.38 


7.14 


70.71 


7.67 


- .65 


Spring', 


69.19 


5.85 


70.56" 


7.70 


.57 


Full Year 


70.78 


5.0? 


70.59^ 


5.11 


- .11 






Rc•^itr Let 


c'(' S.'iiiiple 








Control 
(N = 13) 


Fxi)erimental 
(N - 15) 




Fall 


/A. 15 


5.39 


70.9 7 


8.07 


-1.20 


Spring', 


68.92 


3.95 


70.40 


7.94 


.61 


Full Year 


71.54 


3.40 


70.68 


5.28 


- .50 



=16. 

I) 

L was computed for tl.c difference between uncorrelated 
■ncans because tlie correlation between paired experimental- 
and controJ-group teachers' CTK scores was only .17 in the 
f.ill and .2H In the sprin,;. The ts for correl.-.ted ...eans 
had to be based on fewer teacliers and were, in any case, 
essentially the same in mai;iiitiide and statistical signifi- 
cance as chose reported liere. No L reported' here is statis- 
tically significant (a = .05), 
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x..,^.. >.a J. protrcauiiont CTR moan fails 

below the corresponding control-group mean (-1.6? raw- 
score points, or -.23 SD) . After treatment, in contrast, 
the experimental-group nean is slightly above the mean of 
the control group (1.37 raw-score points, or .20 SD) . 
These small differences are more pronounced in the re- 
stricted sample, where the standardized mean-differences 
are -.46 SD and .21 SD, respectively. 

Also, although the experimental-group pretreatment 
CTR is slighlty lower than the corresponding control- 
group CTR and, further, the experimental-group posttreat- 
nient CTR is slighly higher than the corresponding control- 
group CTR, CTR in each yrorp declines from fall to spring. 
This is considerably more marked for the control group, 
however: Raw change in CTR from fall to scriny for control- 
group ceachers is -5.23, whe-.-eas the corresponding figure 
for experimental-group teachers is -.57. This aspect of 
the m.ean differences in total CTR suggests that the effect 
of the teacher training may have been to retard a decline 
from fall to spring in the incidence of training-rela ted 
teaching practices arnoi y experimen La 1 -group teachers. 

While perhaps encouraging, these trends were not likely 
to have made any significant difference— statistically 
or praetieally-in the end-of-year academic achievement 
of the students in the control and experimental groups. 
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P-osulrs: Troat-T.ont L . . : ccts on /-vchlevG.'ne nt 
The ir.pact of the TEP on student achievement was 
ex-anined using the Johnson-Neynan technique (J-N) , (e.g., 
Rogosa, 1980). This technique, an alternative to con- 
ventional analysis of covariance (ancova) for assessing 
treaLrr.ent effects, is especially useful in comparing non- 
parallel regression lines. Because J-U is generally 
not as faniliar as ancova, the two approaches will be 
briefly discussed side by side. The outline of the 
analyses that were performed will follow. 

Basically, ancova is a comb i;at ion of analysis of 
variance (anova) and regression analysis: The differences 
between posttest means are examined in conjunction with 
the posttest-on-pretest pooled regression. In a two- 
group pretest-posttef^t randomized design, for example, 
ancova evaluates the posttest mean-difference after taking 
into consideration between-group variance on the pre- 
test, or covariate. Such a procedure has two important 
advantages over an anova on the posttest means, alone: 
(a) The posttest mean-difference is adjusted for any 
mean difference on the pretest, and (b) because pre- 
test variance is removed from the error term and ex- 
plicitly incorfjorated into the analysis, the reduced 
error variance results in more precision for the comparison 
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of the within-group3 regressions. This increased pre- 
cision results in a greater probability of rejecting 
the null hypothesis (i.e., power) when, in the popula- 
tion, the null hypothesis does not hold. 

A major assumption of the ancova model is that the 
covariate and treatment do not interact; that is, it is 
assumed that the population within-group regressions 
are parallel, or homogeneous. If significance tests 
indicate heterogeneity of regressions, ancova should not 
be used. But, because such statistical tests typically 
lack sufficient power to detect significant differences 
in slope (e.g., Cronbach & Snow, 1977), failure to re- 
ject the null hypothesis from a perfunctory test for 
homogeneity of regressions does not insure that the popu- 
lation within-group regressons arc homogeneous. J-N 
makes no assumption regarding a covariate- treatment inter- 
action and, according to Mendro (1975) , "offers possibly 
the only satisfactory alternative to [ancova] when group 
regression coefficients are unequal" (quoted in Rogosa, 
1977, p. 2). This technique, in contrast to ancova, 
establishes "regions of significance" on the covariate 
in Which there is a statistically significant difference 



Here, the term "within-group" refers to the respective 
treatment group. Thus, a regression is computed separately 
within tne experimental group and within the control group 



between treatments. Rather Uhan asking the ancova 
question concerning the treatment effect, J-N asks the 
question: For what range of X, the covariate, does a 
significant treatment effect exist? The distinction 
between the two questions is nontrivial. 

A brief example, in the context of the present 
study, may help clarify the distinction between these two 
methods. X and Y are the pretest and posttest, respectively, 
and there are two experimental conditions: Experimental- 
group teachers receive the TEP , and control-group teachers 
do not. The ancova model assesses the treatment effect 
by looking at the group differences on Y witii X as a covariate 
Essentially, Y is regressed on X and the residual s— what 
is not predicted by X— are examined for any treatment effect. 
Any obtained treatment effect is assumed to be constant 
over all levels of X— that is, the two slopes are assumed 
to be parallel (see Panel a of Figure 2) . 

The assumption is made, then, that the TEP has a 
relatively uniform impact on school achievement, regard- 
less of whether the classes are low, medium, or high on 
entering ability. ^ if this is not the case— if, in fact, 
treatment and covariate interact— a major assumption 



Here, the term "ability" is used loosely, referring to 
the general achievement level of the class at the be- 
ginning of the school year. 
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u(x) 

hyperbola 



00000000 regioo of significance 



Figure 2 Comparing within-group regression lines: 
a parallel slopes, (b) treatment and covariate interact, 
(c; Johnson-Neyman technique. 
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underlying ancova is viola Li d and alternative methods 
should be considered. If ancova £s used, the results 
may be very misleading, depending on the degree of 
interaction present. 

Imagine that an interaction exists between treat- 
ment and covariate such that the TEP has a large effect 
in low-ability classes, but a virtually negligible 
effect in high-ability classes (see Panel b in Figure 2). 
If ancova i^ used, a single vertical distance between the 
two within-group regressions is assessed and, in turn, 
used as an estimate of the adjusted mean-difference on Y. 
This difference, however, is evaluted at the value of X 
corresponding to the weighted average of the two group 
means; as such, this difference can bo thought of as an 
"average" or "overall" treatment effect (Rogosa, 1980). 
In the context of the interaction presented in Panel b 
of Figure 2, the reported treatment effect would be mis- 
leading, indeed: It would underestimate the treatment 
effect for low-ability classes and overestimate the treat- 
ment effect for high-ability classes. The problem is 
especially pronounced where the lines of a markedly dis- 
ordinal interaction cross in the middle of the range of 
X: The adjusted mean-difference on Y would be roughly 
zero (i.e., because this adjustment is made near the point 
at which the lines cross) , while an examination of the 
vertical differences between the regression lines at 
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low and high value.- of X would result in a drastically 
different impression. Clearly, with nonparallel straight 
lines, reports only of the vertical distance are neither 
very meaningful nor very compelling (Rogosa, 1980)/ 

J-N, on the other hand, does not require the assump- 
tion of homogeneity of regressions. A line, D(X), is 
determined that represents the vertical distance (D) , 
for a given range of X, between the two sample within- 
group Y-on-X regressions. "For the comparison of the 
within-group regressions, D(X) is the key summary of the 
data" (Rogosa, 1980). To assess statistical significance 
for D(X), a simultaneous confidence band is constructed 
for the difference of the population within-group re- 
gressions. This band is bounded by hyperbolae (see 
Panel c of Figure 2) . 

The simultaneous version of j-N identifies regions^ 
on X in which there is a statistically significant dif- 
ference between the two sample within-group regression 
lines. These regions are identified by the manner in 
which the confidence band intersects the X axis ^Values 
of X that fall outside the confidence band are in the 
region of significance. in Pancel c of Figure 2, where 



Before constructing the simultaneous confidence band, 
one can conduct a preliminary test to see whether or not 
any regions exist. See Rogosa (1980) or, for an alterna- 
tive formula, Serlin and Levin (1980). 
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D(X) is based on the interaction presented in Panel b, 
one sees that this region covers low-ability classes 
to classes moderately high in ability. The region of 
significance does not extend beyond this point. The 
conclusion would be that the teacher training had an 
appreciable impact on achievement for this particular 
range of X, but not beyond. it is in this manner that 
the results of J-N can be more revealing and meaningful 
than an ancova estimate of the "average" treatment effect. 
Procedures and Outline of Analyses 

Each class mean was weighted by the corresponding 
number of students on which the mean was based. Such 
weighting takes into account differences in precision of 
class means that results from different class sizes. 
Weighting is especially advisable when there is marked 
variability in this class characteristic (Cronbach, 19.76, 
pp. 4.7-4.11; Cronbach & Webb, 1975). 

First, the two within-group regressions were compared. 
The model for these regressions, computed separately 
within the experimental group (subscript E) and the con- 
trol group (subscript C) , is: 

--- - . = "E ''E^j + j=l 

=~'^C • C^j • j ^=nj,+l N 



(1) 
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where u and p represent, respectively, the intercept 
constant and the slope associated with the posttest (Y) 
on pretest (x) regression. The within-group regressions 
can be combined as: 



= + P.pT + n X. + H.T.X. + r. 
J J- 3j 4]] 3 



(2) 



for j = l , . . . ,N 



where the new terms are the regression coefficient 

associated with the treatment T--the latter being a 

duiiuny variable coded 0 (control) or 1 (experimental) — 

and the regression coefficient associated with che 

interaction of treatment T and covariato X [•; is 

~ — 4 

equivalent to the difference between the two within- 
group regression coefficients; in the context of 
Equation 1, = y.^ - . ig the intercept constant 

and is equivalent to in Equation 1. is the regres- 

sion coefficient associated with the covariate, and is 
equal to in Equation 1. Parameters in Equation 2 were 
estimated by ordinary least squares; analogous relations 
hold for the sample quantities. (Sample estimates arc 
denoted by the lower-case "b.") 

Again, J-N involves the calculation of a line, D(X). 
D(X) represents, for the range of X, the vertical distance 
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between the two sample wi thin-group regressions. Two 
sample estimates are needed to determine D(X) : 

D(X) = b2 + (3) 

The line D(X) is plotted against the X and Y axes. 
Here, b2 is the point at which D(X) intersects the Y 
axis and b4 is the slope of D(X). The point of inter- 
section at the X axis is equal to -b2/b4 . The Y axis, 
scaled in the units of the particular posttest, reflects 
the vertical distance between the two sample within- 
group regressions and intersects the X axis at Y = 0. 
Thus, the difference between the two regression lines is 
zero at the point at which D{\) intersects the X axis. 

With this information, then, a plot such as that 
appearing in Panel c of Figure 2 can be constructed. 
From a plot like this, one can determine the vertical 
distance between the tv;o sample within-group regressions 
for a given X. Alternatively, this distance, for a given 
X, can be assessed by solving for D(X) in Equation 3. 
(The procedure for constructing these plots is outlined 
in Appendix A. ) 

Identifying the simultaneous region of significance 
involves few additional calculations (procedures arq 
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I^rosontod in Hocjosa, 1980). The un.'; tatrJard i zod rccjrossion 
coefficients b2 and and elements of the corresponding 
varianco-covariance matrix are required for these compu- 
tations. A 100(1 - ,i) percent simultaneous confidence 
band for the difference of the two population wi thin- 
group regressions is constructed. As noted above, the 
region of significance is identified by the intersection 
of the confidence band with the X axis. The region of 
significance comprises those values of X that fall outside 
the confidence band (see Panel c of Figure 2) . If the 
confidence band does not intersect the X axis, there is 
no region of significance. (The procedure for constructing 
a 95". simultaneous confidence band is outlined in 
Appendix A) . 

If a "packaaed" regression program (e.g., Nie, Hull, 
Jenkins, S teinbrenner , & Bent, 1975) is used and the 
terms in Equation 2 are entered stepwise, one a..' • • : - Tially 
can examine the ancova estimate of the treatment effect, 
as well as test the homogenei ty-of-regressions assump- 
f ic-n. The training term (T) would be entered on the 
first step, followed by the covariate (x) on the second 
step, with Llio interaction term (TX) entered on the 
third and final step. Because it represents the differ- 
ence of the two samjilc wi thin-grou[i regression coef- 
ficients, b4 provides information bearing on the 



4 [J 



40 



liornoiienoi ty-o^-l-cHI|-c^•■;si<)ns .KSsuiiipLiori . Wlion Lho t.wo lino 
are parallel, the slopes are equal and = 0. Converse- 
ly, when the two lines are appreciably nonparallel-- 
that is, when interaction exists — tlie slopes are differ- 
ent and b^ is comparatively large. Thus, evaluating the 
magnitude of loads to a conclusion concerning the 
assumption of homogeneity of regressions.^ 

The treatment effect that would be provided by 
conventional ancova is obtained by examining the re- 
gression coefficient associated with the treatiTient term 
(T) at the second step of the regression procedure. 
This coefficient is identical to the :reatment effect 
that would be obtained from a packaged ancova program 
and, as such, represents the adjusted mean-difference 
on the dependent variable. The pooled wi thin-group 
regression coefficient is equivalent to the regression 
coefficient associated with the covariate at the 
second step of the regression procedure. The pooled 
slope is fundamental to ancova and, further, enables 
one to compute the adjusted means. 

Thus, assessing treatment effects on student 



An interesting observation is that useful regions of 

c 

ficance can be obtained even where one fails to reject 
the null hypothesis p. = 0 (Rogosa, 1981). 




So 



achiovcmcnt comprises three steps: For each dependent 
measure, (a) the sample within-group regressions were 
examined; (b) J-N was employed to identify possible 
regions on the covariate in which the two within-group 
regressions are significantly different; and (c) the ancova 
estimate of the treatment effect was examined — an estimate 
representing the treatment effect for the ^'average" 
individual over the range of X. 
Pooling Grades 

Table 8 presents the number oC "quasi-classes , " 
by grade and experii^en tal condition. As used here, a 
quasi-class comprised only those students in a particular 
grade in a class. in the case where a teacher had, say, 
only fourth-grade students, the class and the quasi-class 
were identical. In contrast, the quasi-class merely 
was a subset of the class for the teacher who had a fourth- 
fifth combination . 

It is clear from this table that the within-grade 
number of control-group and experimental-group classes 
IS small — too small, in fact, to warrant a separate 
analysis of treatment effects on student achievement for 
each grade. Consequently, all analyses wc^re conducted 
with different grade levels combined. This was accom- 
plished through a linear transformation of the CTBS raw 
scores. First, each student's raw score was converted 
to a T score (M = 50 , SD = 10 ) . This was done separately 



Table 8 

The Number of Quas 1-Classes , 
by Grade and Experimental Condition 



Experimental ^'^^"^ 

Condition A 5 6 



Control Group 7 10 A 

Experimental Group 9 9 5 
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for each of the throo grades. Those T scores were then 
aggregated at the class level, yielding a mean T-score 
for each class. 

As was noted above, each class mean was weighted by 
the corresponding number of students. This entailed 
assigning each student the mean of his class. Because 
the computer consequently treated the student as the 
unit of analysis, however, certain statistics (e.g., 
F ratio, standard error) needed to be adjusted to reflect 
the actual number of classes represented in the analyses. 
(The procedures for adjusting these statistics are out- 
lined in Appendix B.) 

Descrip tive Statistics and Within-Group Regressions 

The CTBS pretest and posttest means and standard 
deviationr, are presented in Table 9. Table 10 presents 
the intercorrelations among the measures; the pretest- 
posttest correlation for each measure is reported in the 
diagonal. As can be seen from Table 10, the pretest- 
posttest correlation for the total score is .84— a value 
representing the relative stability of performance over 
a 12-month period. 

The sample wi thin-group regression equations for the 
dependent measures are presented in T.ible II. These 
within-group regressions were plotted and, accompanied 
by the corresponding within-group scatterplots , appear 
in Figures 3-8. The actual range of data for each group 
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Table 9 

CTBS Pretest and Posttest Means and Standard Deviations 



Pretest 



Test 

Reading 
Total 

Mathematics 
Total 

Total Score 



Control 
(N = 13) 



M 



SD 



Ex|)ori mental 
(N = 15) 



SD 



Pooled'' 
(N = 28) 

M SD 



50.40 5.71 49. 6B 4.82 50.00 5.24 

50.15 5.23 49.88 4.5G 50.00 4.87 
50.29 5.78 49.76 4.87 5000 5.30 



Reading 
Total 

Mathematics 
Total 

Total Score 



Post tost 



50.98 5.32 49.21 4.83 50.00 5.13 

50.37 4.54 49.70 4.07 50.00 4.29 
50.70 4.99 49.43 4.63 50.00 4.83 



Note: N - 28. Grades were pooled through a wi thin-grade 
T-score transformation (M = 50, SD = 10) of student level 
scores . 



Experimental conditions pooled. 
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Table 10 



Intercorrelations of CTRS Subtests 



and Total 


Score 








1 


2 


3 


Reading Total 


(86) 


91 


98 


! Mathematics Total 


86 


(77) 


98 


1 Total Score 


95 


96 


(84) 



Mote: N = 28. Grades v/ere pooled through a within- 
grade T-score transformation (M = 50, SD = 10) of 
student-level scores. Weighted betv/cen-class cor- 
relations are reported; decimals have been omitted. 
Pretest correlacions appear above the diagonal, 
posttest correlations appear below the diagonal, 
and tho pretes t-pos ttes t correlation for each 
measure appears in the diagonal. 



Table 11 

thin-Groui) Regressions'^ of CTBS Posttest on Pretest 



Experimental 

(N = 15) 

SE(b) r a b SE(b) 



Test 

Read ing 
Total 

Mathematics 
Total 

Total Score 



r 

.82 

.70 
.79 



Control 
(N = 13) 

a b 

12.290 .768 

19.915 .607 
16.463 .681 



.159 

. 186 
.160 



91 3.995 

85 11.983 
90 7.069 



.910 .117 

.756 .131 
.851 .117 



Note: N - 28. Grades were (looled through a wi thin-grade T-score trans- 
formation (M = 50, SD - 10) of student-level scores. 

^r = prete^.t-posttest correlation, 
a = intercept con';tant. 
b ^ unstandardized regression coefficient. 
SE(b) = standard error of b. 
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Figure 3. Within-group scatterplots : Reading Total 




□ 
+ 



□ 



+ □ 



1 CTrrrnl rr-r.in 



'J 



Figure 5. Within-group sea tterpl ots : Mathematics Total 




Fioure 6. Wi tliin-cjroup regressions: Mathematics Total. 
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Figure 7. With in -group scatterplots : Total Score. 




Figure 8. Wi thin-group regressions: Total Score. 
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corresponds to the range of X for which the particular 
regression was plotted. 

As these figures illustrate, each pair of within- 
group regressions is similar: Vertical displacement at 
any point on X is, at most, slight. From a mere visual 
inspection of these pairs of within-group regressions, 
then, it does not appear that the treatment had an 
appreciable effect on end-of-year student achievement. 
Johnson-Neyman Analyses 

Table 12 presents, for each dependent measure, the 
sample regression coefficients and b^ corresponding 
to Equation 2 and reported in the final step of the re- 
gression procedure. Again, the regression coefficients 
and b^ are associated with, respectively, the treat- 
ment term (T) and the term representing the interaction 
of treatment and covariate (TX) . Additionally, three 
entries in the varianco-covariancc matrix^ of these 
regression coefficients are needed for J-N: The variance 
of {denoted S22) / the covarianco of b^ and b^ (denoted 
S24) , and the variance of b^ (denoted s^^). Table 13 
presents theso three values for tlie dependent mea.surcs. 



7 

Although there are several packaged programs that generate 
this matrix, the simplest by far is the SAS SYSREG pro- 
cedure (SAS Institute Inc., 1979). All one does, in 
addition to specifying the regression model, is request 
option COVB. This option outputs the variance-covariance 
matrix of the unstandardized regression coefficients for 
the specified model. 
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Table 12 

Selected Regression Coefficients^ for Full J-U Equation 



Test 






b 


SE(b) F 


ppf^ d T n n 

Total 




-S 


294 


10.071 < 1 








142 


.200 "1 


i-'athematics 
Total 




-7. 


931 


11.170 - 1 








149 


.222 1 


Total Score 




-S. 


394 


10.125 - 1 




^4 




171 


.201 < 1 



rjoto: :i - 2R. Grades were [)Ooled throuqh a within-arade 
transformation (M = 50, SD = 10) of student- level scores. 

^ uns tandardized regression coefficient associated 

with the treatment term (T). 
^ uns tandardizod regression coefficient associated 
witti ttie interaction term (TX). 
5E(b) = standard error of b. 




Table 13 

Elements^ of the Variance-Covariance .Matrix 
of Regression Coefficients for Full J-fJ Equation 



Tost <; c c 

^22 ^24 ^44 



Reading 

Total 101 .4323 -2.0059 .0401 

Matfiema tics 

Total 124.7725 -2.4616 .0491 

Total Score 102.5227 -2.0271 .0405 



Note: U = 28. Grades were pooled through a 
v/i thin-grade T-score transformation (M = 50, 
SD = 10) of student-level scores. 

a 

^ variance of b^. 
s^^ = covarianCG of b^ and b^. 

^44 variance of b^. 
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gure 9. D(X) and m simultaneous confidence band: Reading total. 
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procedure, iz viil be recalled, one is testing the assuir.p- 
L^on recardincj hcnogeneicy of regressions. The F ratios 
associated v;ith (see Table 12) indicate, by conven- 
tional standards, that this assumption was not violated 
for any of the dependent rr:easures. Thus, although some 
of the pairs of within-group regression lines are non- 
parallel, zhc degree of he terouene: ty present is not suf- 
ficient to Lc statistically significant. 

Table 14 presents the ancova estimate of the treat- 
ment effect and the corresponding 951 confidence interval 
for each measure. (The procedure for constructing these 
confidence intervals is outlined in Appendix C.) As can 
be seen from this table, these estimates of the "average" 
treatment effect are small and not significantly different 
from zero. Further, the 951 confidence interval in each 
case sr-^ans zero (which, of course, is expected with chance 
iir.dinqs) and <«xtend comparatively far into the positive 
ru-Mon. It is cojice i v/ib] o , then, th.jt v;ith a replication 
study the obtained treatment effects all could be positive 
(though nevertheless statistically nonsignificant). The 
pooled within-group regression coefficients and the ancova- 
adjusted posttest means are presented in Table 15. 
Suriuiia ry 

The teachc^r training was ineffective in improving 
student achievement. This conclusion held when (a) regions 
on the cr)variate wore* liouc.ht in which the rJiflerence 



Table 14 



"Hcova Esti''iatcs of Treatr'.ent Effects, 
With 95 Confidence Intervals 



.reat-ent Lever and Upper End-Points 01 

Effect F a 95' Confidence Interval 

Reading 

Total -1.171 1.234 -3.333, .991 
/■'athematics 

Total - .481 . ] -2.895, 1.933 

Total Score - .366 - 1 -2.999, 1.267 



r.'ote: N - 2P,. Cradcs were [jooled through a v/i thin-grade T-score 
transformation (M = 50, SD = 10) of student-level scores. " 
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Total jy] 
Tott^l Score ,75 



T-'^-Core trdn-, foniiat iofi (M ^ 
scor^os. 



50.27 49.73 
50.^1[i 49-61 

(jooled throijqh a wi tfiin-grade 
50. - 10) of studont-loypl 
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of the within-cjroup royrcssions was statistically 
significant and (b) the ancova estimate of the "average" 
treatment effect was examined. This result is not sur- 
prising, of course, in view of the poor treatment imple- 
mentation . 

Resul ts; The Relationship Between CTR and Achievement 
In addition to examining the effects on student 
achievement of the intervention, one can pool experimental 
conditions and carry out process-product analyses. That 
is, correlations can be obtained between the teachers' 
conformity-to-recommendations (CTR) and student achieve- 
ment — whether the former was natural l.y occurring or 
attributable to the training. 

Thus it is acknowledged that, irrespective of experi- 
mental condition, there will be variability in CTR. To be 
sure, not all experimental-group teachers would be expected 
to demonstrate the same degree of CTR; teacher attitudes, 
beliefs, motivations, and so on, doubtless are operating 
here. And the assumption would not be made that, by virtue 
of their group assignment, control-group teachers would 
demonstrate no CTR whatsoever. On the contrary, one would 
expect natural variability in CTR hum, as well. 

Such an analysis is informative in the present context 
in that it yields additional evidence concerning the rele- 
vance of the teacher training to student achievement. In 
this sense, the "effects" of a program or treatment can be 
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evaluated by examining all teachers, regardless of the experi- 
mental condition to which they initially had been assigned. 

In the following analysis, a full-year measure of total 
CTR was obtained by averaging total CTR across the four 
occasions; this served as the "process" measure. "Product" 
was a residual score based on the CTBS posttest total. These 
residuals were obtained by regressing the CTBS posttest total 
on the pretQst at the student level. (The difference between 
the obtained and the predicted score is the residual and 
represents performance on the posttest that is uncorrelated 
v/ith pretext performance.) These residuals were then aggre- 
gatea .. ;. --.ss Ic-vei and, in turn, correlated with the 

process measure." The, result is a part correlation. 

The resulting correlation between total CTR and residual 
achievement is r = .29 (£ > .10). This correlation in- 
creases considerably, however, with the removal of one 
discrepant case (r = .40, p < .05). Figure 12 presents 
the scatterplot for this correlation, with the outlier 
identified. Thus the training as whole, derived from pre- 
vious process-product research (Brophy & Evertson, 1974; 
McDonald & Elias, 1976; Soar, 1973; Stallings & Kaskowitz, 
1974) , ostensibly has some pedagogical value in the present 
context . 



8 

Unlike the analyses presented in the previous chapter, here 
class mean-achievement was not weighted for differences in 
class size. Because it would have entailed similarly weightino 
CTR— a teacher variable, the weighting of which is inappro- 
priate — weighting was deemed undesirable for this analysis. 
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FLcjuro 12. ScatLorplot of full-year CTR and residual 
achievement. (Discrepant case is denoted by "+".) 
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Discussion and RecominGnda tions 
As an experiment, this study failed to corroborate 
the positive results obtained previously in similar class- 
room based experiments (Anderson et al., 1979; Crawford 
et al., 1978; Good & Grouws, 1979; Stallings et al., 1979). 
At the end of the school year, the experimental-group 
teachers did not evidence markedly greater conformity 
to the training recoipjnenda tions than that exhibited by 
the control-group teachers. Further, the classes of these 
two groups of teachers were not appreciably different in 
end-of-year academic achievement. 
Poor Treatment Implementation 

There was a priori reason to expect the desired change 
in teaching practices among experimental-group teachers. 
After all, the intervention in the present study was the 
same as the "minimal training" condition in the study by 
Crawford et al. (1978). In that study, it will be recalled, 
the minimally and maximally trained teachers exhibited 
similar conformity to the recommendations, and both 
experimental groups evidenced marked superiority over 
the control group. Further,* the experiments conducted 
by Anderson et al, (1979) and Good and Grouws (1979) 
similarly did not involve a comprehensive delivery system, ' 
And, as reported above, both studies resulted in posi- 
tive change in training-related teaching practices 
among expe imental-group teachers. 
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Here, it will be argued that poor treatment implemen- 
tation in the present study was due in large part to 
several methodological and contextual differences be- 
tween this study and those conducted by Crawford et al . 

(1978) , Anderson et al. (1979), and Good and Grouws 

(1979) . 

Classroom observations . Classes in the present 
study were observed for a maximum of eight hours through- 
out the entire school year: two two-hour periods in 
both the fall and spring. Crawford et al. (1978), in 
contrast, obtained classroom observations for approxi- 
mately 16 full days throughout the school year— before, 
during, and after treatment. 

While the manifest function of classroom observations 
is to obtain information concerning classroom character- 
istics and events, the latent function of such obser- 
vations may be to facilitate treatment implementation. 
Minimally trained teachers in the Crawford et al . (1978) 
study urn ittingly may have come to regard the relatively 
frequent and lengthy classroom observations as a kind of 
supervision or monitoring. if so, the conduct of class- 
room observations likely would have enhanced the compli- 
ance of these experimental-group teachers with the train- 
ing recommendations. The failure of experimental-group 
teachers in the present study to implement the training 
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reconuuondations, thon , may have- resulted from the rela- 
tively infrequent and brief classroom observations. That 
is, if experimental-group teachers would perceive the con- 
duct of classroom observations as a supervisory mechan- 
ism, then poor implementation might be attributed to 
their receiving much less of such supervision. 

The plausibility of this conjecture must be evaluated 
in view of the finding reported by Anderson et al . (1979). 
As noted above, there were two experimental groups: trained 
and observed, and trained but unobserved. Analyses of 
cnd-of-year achievement data indicated that both groups 
were equally superior to the control group in improving 
reading achievement. If one assumes that equal effective- 
ness in improving achievement must have been accompanied 
by correspondingly equal conformity to the training recom- 
mendations, these results suggest ^ hat the absence of 
observers in the classes of experi..en tal-group teachers 
does not reduce treatment implementation. Thus, it might 
be argued, the comparatively low amount of classroom 
observations in the present study cannot be held respon- 
sible for the ineffectiveness of its training in bringing 
about the desired changes in teaching practices among 
oxpor imenta 1 -group teachers . 

There remains, however, a fundamental difference 
between the present study and the one conducted by 
Anderson et al. (1979). 
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Initial meotings with Lcache r^. in the present study, 
teachers never met with project staff for discussion, 
question-and-ans\;er, and so on; the TEP simply were mailed 
to the experimental-group teachers. Anderson et al. 
(1979), in contrast, met twice with all experimental- 
qroup teachers (i.e., including those trained but unob- 
served) — once to describe the purpose of the study and 
distribute the training material, and the second time to di: 
cuss the instructional model presented in the training 
material . 

These meetings likely fostered treatment implemen- 
tation. First, the meetings doubtless were informative, 
facilitating understanding of the instructional model 
and its applicability. Second, by holding these meetings, 
the project staff were in a position to communicate enthu- 
siasm for the training and personal concern for the 
teachers. The teachers' perception of that enthusiasm 
and concern could effect a favorable disposition of the 
teachers to the overall project and, in turn, enhance 
subsequent implementation. In short, these initial 
meetings with teachers may have served to address the 
three general factors affecting implementation of change 
proposals that were outlined by Doyle and Ponder (1977) 
and discussed above: instrumentality, congruence, and 
cost. The conduct of the two meetings in the Anderson 



et al. (1979) study, then, may have offset the absence 
of classroom observers for the trained-but-unobserved 
teachers . 

Clearly, the relative effects on treatment implementa 
tion of classroom obsorvations and treatment delivery 
remains an open question. The implication of the mini- 
mal-maximal parity reported by Crawford et al. (1978) is 
clouded by the comprehensive o):)servations cond^icted in 
the classrooms of all teachers. Similarly, the implica- 
tion of the trained-but-unobserved finding reported by 
Anderson ct al. (1979) is obscured by the initial meet- 
ings attended by all teachers. And neithei of these fac- 
tors was manipulated in the study conducted by Good and 
Grouv/s (1979): All teachers attended initial meetings, 
and observations were conducted in all classrooms. 

The SES context . A third factor possibly attenuat- 
ing treatment implementation in the present study is the 
urban 1ow-SL:s context in which it was conducted. The 
Crawford et al . (1978) study, ix. will be recalled, was 
conducted in a middle-SES school district, as was the 
study conducted by Anderson et al. (1979). Good and 
Grouws (1979) rcy^orted only that most of the partici- 
pating schools were in low-SES "areas" (p. 356) . (With- 
out more information on this sample, it is difficult to 
disucss context effects on treatment implementation. 
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The assumption will bo made novorthcless that the sample 
in the present study was lower in socioeconomic status 
than that in the Good and Groi.i.s (1979] study.) 

The context in the present study may have made it too 
difficult for the experimental-aroup teachers to respond 
positively and cooperatively to the training. The demands 
of teaching in an urban low-SES climate doubtless are 
quite different from those in other contexts. Indeed, 
Levy (1979) likened teaching in the urban school to engag- 
ing in combat. Further, students not uncommonly are ill- 
prepared for the motivational and cognitive demands of 
classroom processes, posing additional problems for teach- 
ers, A local newspaper, in fact, reported that this 
particular school district fell "at the bottom of the 
heap" ( San Francisco Chronicle . November 9, 1979, p. 5) 
among the some 1, 000 dii^tricts in California in average 
performance on a sta to-wide proficiency test administered 
during the school yeur in which the present study was con- 
ducted . 

Treatment implementation, then, may have been atten- 
uated by the context of the intervention. Although all 
teachers were volunteers and, hence, presumably disposed, 
initially at least, to cooperate in it, they may have been 
too distracted by their more difficult regular teaching 
activities to be able to comply with the training recom- 
mendations , 



Proposition 13 » In Juno 1978, Proposition 13 was 
passed in California and rosulLed in a number of measures 
to reduce property taxes and limit government spending. 
In addition to its large impact on school finance, Propo- 
sition 13 had other, perhaps equally serious, repercus- 
sions. The president of the California State Board of 
Education stated that 

Proposition 13. . . .had a significant effect on 
the morale of teachers and other public employees. 
Teachers feel that the state has abrogated all 
collective bargaining agreements — declaring them 
null and void and canceling all pay raises. 
Teachers realize that their jobs are tied to very 
uncertain revenue sources. . . .As uncertainty 
increases over the state bailout in future years, 
this morale problem may become worse. (Kirst 
1979, p. 431) vxMr^u, 

Empirical evidence supporting this ap:3raisal was pro- 
vided by Calfee and Pessirilo-Juri sic (1979), who inter- 
viewed 81 teachers, 197 principals and vice-principals, 
and SIX administrators to gain "insight into the nature 
of declining morale among public school teachers in 
California" (p. 4). The interviews were conJucted in a 
large school district in the San Francisco Bay Area dur- 
ing the school year in which the present study was con- 
ducted, the first school year following the passing of 
this initiative. 

The results indicated that many of these 104 educa- 
tors perceived disconcerting changes in California educa- 
tion—changes they attributed primarily to Proposition 13. 
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By and large, "educators felt that thoy were working harder 
than ever, under worsening conditions and receiving fewer 
rewards — both psychologically and financially" (p. 20) . 

In retrospect, the state of affairs resulting froni 
Proposition 13 may not have provided suitable conditions 
for a study that called upon teachers to expend additional 
time and energy. Thus, the "climate" of the California 
public school in the early Proposition 13 era may have 
attenuated treatment implementation in the present study. 
Although the teachers volunteered for the experiment in 
September 1978, three months after Proposition 11 was 
passed, its effects on teacher morale likelv grew stronger 
after the school year yot underway and teachers and adminis- 
trators had further opportunity to consider its immediate 
and potential effects. Teacher morale, then, may have 
suffered a fter the teachers volunteered to participate 
in tno 3tu'3y and during the school year in which it was 
conducted . 

Absence of Treatment Effects on Student Achievement 

Clearly, in the present study, treatment implemen- 
tation was a necessary coruition for treatment effects 
on student achievement. Analyses of the classroom obser- 
vation data indicated that relevant teaching practices 
were not altered appreciably by the intervention. This, 
indeed, appears to be the most compelling reason for the 
absence of treatment effects on student achievement. 
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Thoro are, however, addi. tion.i 1 , ostensibly plausible, 
reasons for the absence of such effects. 

Irrelevant teaching recommendations . First, if the 
teaching reconunendations contained in the TEP were actually 
unrelated to student achievement, one obviously would not 
expect treatment effects on this criterion in the present 
study. It is unlikely, however, that this is responsible 
for thr- ob»:ained results. The recommendations, it will be 
recoiled, were bused on previous p . ucess-product correla- 
tions obtained from studies of teaching and learning in 
regular classrooms. Further, many of the process-product 
correlations that were incorporated into the TKP were ori- 
ginally obtained in a low-SL'S context. Thus, it seems 
fair to assume that the teaching recommendations were 
related to student achievement — achievement in a school 
district like the one in which --'lo present study was con- 
ducted.. And the 'Obtained correlation }>erwcen CTR and resi- 
dual achievement lends support to tMs .issuuiption . 

Inappropri ate dependent measur -,. Second, it is pos- 
sible that there were not treatmo.it effects on student 
achievem-nt because, in part, the dependent .Tieasure— a 
staridardi -ed achievement tost — was i.napt\- r^priato for 
evaluating treatment of this kind. To be sure, stan- 
dardized achiove.aent tests typically have high reliability 
and adequately reflect prevailing curricular trends 
(e.g.. Sax, 1974). But, as Berliner (1977) has argued, 
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th<js<j ti!sts may have poor cunLont validity at tlio class- 
room level. rurthor, because thoy often correlate 
substantially with measures of general intelligence, 
test items may not be very reactive to instruction. 
"Off-the-shelf standardized tests," Berliner (1977) 
contended, "make poor dependent variables for studies 
of teaching" (p. 148) . 

The principal investigators wore prevented by the 
school district's centrol administration from admini:,- 
teriny a specially constructed achievement test to the 
students. Their concern, justifiably, was to avoid 
excessive testing. Consequently, the CTBS was used as 
the measure of achievement — a test that was routinely 
administered to the students as part of the school dis- 
trict's regular testing program. 

While a specially constructed test may be more 
reactive to classroom instruction and, hence, a -nore 
appropriate dependent measure in research on teaching 
there is ample precedent for the fruitful use of stan- 
dardized achievement tests in such research. For example, 
the large-scale correlational studies condur ' ^-d by Brophy 
and Evertscn (1974), McDonald and laias (1976), Soar (1973), 
end Stal lings and Kaskowitz (1974) employed standardized 
achievement tests as dependent measures. And each of these 
studies yielded substantive findings concerning process- 
product relationships. Further, three of the four classroom- 
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based oxporimonts employed standardized achievement 
tests as the dependent measure; each experiment obtained 
positive results. 

An interesting finding related to this issue was 
reported by Good and Grouws (1979) . In addition to 
administering a imbtust o.U a standardized achievement 
test, they administered a "content test'' specially con- 
structed for the particular school district in which 
their study was conducted. While there was a strong and 
positive treatment effect on achievement as measured 
by the standardized test, thero. was not a comparable 
treatment effect on achievement as measured by the con- 
tent test (although the mean difference was in favor of 
the experimental group). Ostensibly, the former was more 
sensitive, or reactive, to the treatment than the latter. 
(There was, however, i possibility of a ceiling effect 
on the content test; subsequent analyses should clarify 
this resii t.) 

There is evidence, then, to support the use of 
standardized achievement tests in this kind of research. 
Perhaps, though, the issue oucjht not be phrased in an 
either-o- fashion; rather, the choice of a dependent 
measure should be made in view of the articulated goals 
of the intervention. If one hypothesizes that an inter- 
vention should improve otudent knowledge of the concepts, 
principles , and processes held in common by many curricula. 
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a standardized achievement test would appear to be an 
appropriate criterion. If, in ^-ontrast, the 'hypothesized 
effects of an in. terven tion calls for a measure that is 
much more sensitive to the instructional goings-on of 
the particular classroom, a specially constructed test 
would probably be more suitable- Both, however, can yield 
useful and complementary information concerning the 
ef f ec ts of an intervention ; perhaps an in tervention i s 
best evaluated by employing both, rather than one or the 
other (Sax, 1974, p. 261). 

Summar y . This section is concluded as it was begun. 
Although there are several ostensibly plausible reasons 
for the absence of treatment effects on student achieve- 
ment, the most compelling reason for these results appears 
to be poor treatment implementation: Training-related 
behavior among experimental-group teachers simply was 
not modified enough to effect appreciable change in 
subsequen t student achi evemcn t . 

Reconunenda_t ions lor Subs cqu ont Resea rch 

The results of the present study call into question 
t effectiveness of a minimal intervention. It would 
appear that, for an intervention to bc^ successful, the 
i^roject staff has to bo "engaged-' with Iho participating 
teachers in some fashion --for example, through meetings 
and frequent classroom observations. 
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The previous research, however, does not provide 
cleuc implicatons concerning the relative contributions 
of holding meetings and conducting classroom observations. 
Needed are studies that incorporate these features into 
the design systematically. That is, "meetings with teachers," 
and "classroom observations" would be design factors and 
independently manipulated. if these factors were dichoto- 
mous e.g., one introductory meeting versus soverel work- 
shops; a few brief observations versus frequent ana l'^n;i\:hy 
observations— each experimental-group teacher would be 
randomly assigned to one of the four possible combinations. 
In addition to examining the main effect of each factor 
on outcome, one could examine possible interactions. 
Perhaps a small number of observations combined with 
several workshops produces maximum treatment implemen- 
tation and, in turn, the largest increments in student 
achievement . 

Further, one must consider contextual factors that 
may limit the feasibility a minimal intervention. 
Some contextual factors could be incorporated into the 
design to examine possible main effects and interactions. 
It is possible, for example, that "class SES" and "meetings 
with teachers" interact in thoir effects on outcome. 
Teachers of Icw-SKS classes may require relatively fre- 
quent and intensive meetings with project staff concerning 
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the training program and its applicability to their 

particu::ar teaching environment. Teachers of middle- 

• SES classes, in contrast, might be able to profit from 

relatively few and brief meetings. 

The Promise of the Minimal Intervention 
in Research on Teaching 

Good and Grouws (1979) argued that their findings, 

along with those of Anderson et al . (1979) and Crawford 

et al. (1978), indicated that classroom-based experiments 

are capable of yielding improvements in student 
learning that are practically as well as statis- 
tically significant. Such data are an important 
contradiction to the frequently expressed attitudes 
that . . .brief, inexpensive treatments cannot 
hope to bring about significant results. (p. 361) 

The results of this study should servo to tciiper such 
optimism concerning the promise of: the minimal intervention 
in research on teaching. 
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APPENDIX A 

Procedures for Plo\:ting the Line D(X) 
And Constructing a 95% Simultaneous Confidence Band 
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Procedures are outlined here for plotting the line 
D(X) and constructing a 95% simultaneous confidence band. 
(These procedures are adapted from Rogosa [1980].) The 
pretest (X) and posttest (Y) of the CTBS i.otal score ar- 
used as an example. 
Plotting the Line D(X) 

The fij.;-t task in plotting the line D(X), of courne, 
is to define the line. The line D(X) is defined as 
D(X) = + b^X or, in the case cf the CTBS total score, 
D(X) = -9.394 + .171{X) (see Table 12). 

This line is plotted against the X and Y axes. The 
range of data for X, the pretest, is 40.7 to 59.6. The Y 
axis, scaled in the units of the posttest, reflects the 
vortical distance of the tv/o sample within-group regression 
lines and intersects the X axis at Y 0. Thus, the 
difference between the two regression linos is zero at the 
point at whicli the lino D(X) intersects the X axis. 

By usinc ^he equation D(X) = -9.394 + .171(X), one 
can determine D(X> for the minimum (40.7) and maximum (59.6) 
values obtained for X. The two resulting points— (40 . 7 , -2. 
and (59.6, .80)— are plotted ,ind jcMind (:;r,-(- Figure II). 
Ccmsl^ruc tJ. Con ricJ(>nco Band 

Additional statisuics are ncu'ded to construct such a 
bon'i. The weighted average of : 'lo two group ii!(>an;; is 
denoted . (C^ is the point at which the ancova est.Unato 



of the treatment effect is evaluated.) = '^24^^44' 

v;here s^^ is the covariance of and and s^^ is the 

variance of b^ (see Table 13). For the CTBS total score, 
= 2. 0271/. 0405 - 50.052. 

D(C^) is the difference of the two sanple withiii-group 

rtnjression lines evaluated at . That is, D(C ) is the 

a a 

D(a) at (which is equivalent to the ancova estimate of 

the treatment effect, or the adjusted noan-di f f erence on Y.) 

The estimated variance of D(C^^) is denoted s*^^^ , and is 

^ ' a' 

equal to + ^24^a' ^''^ "^^^ term, 5^^, is 1 he variance 

of (see Table 13). Thus, in the present example, 
2 

s ) = 102 . 5227 - 2 .0271 (50 .052) - 1 .062. 

a 

Also needed to construct a simultaneous confidence 

band for the line D(X1 is the estimated variance of D(X), 
2 2^2 2 

D(X)- ^ D(X) D(C^^) ^ -^44 (''^ - ' ^'^'^ ''^ ' 

for exanii.le, ''^^^(v) ^-062 + .0405 (40.7 - 50 .052)^ - 4.G04, 

This v.-^riance is calculated similarly for other values of X. 

A 100(1 - (t) percent s imu 1 tanoou:: ccnfidence band for 

t hv> line D(X) consists of the area m the X,v plane thc.t is 

encio.sod by the upper and lower hy^ 'ri>o]ae 



rh.ese hyperbolae can bo constructed )/ using Equation 5 for 

''2%' 3 doqree of 



successive valines of X n^^-.. • ^ 
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freedom was subtracted to "adjust" for a team-taught 
class.) The interval for DO:) at X = 40.7, then, is 
DCO ^\/2 (3 . 44) (4 . 60) = -2.43 - 5. 63 . Equation 5 is 
employed for as many values of X necessary to detect the 
shape of the hyperbolae (see Figure 11). (Thus, s^ 
need only be determined for values of X for which Ecjuation 
5 is employed . } 
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Adjustincj Statistics for Woightincj Class Moans 
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The following ratio was used to adjust certain sta- 
tistics for woiyhtiny tho class means: 



M - k - 1 



N - k - 1 



where the actual number of classes is denoted by M, k refers 
to the nuniber predictors in the regression, and N repre- 
sents the total number of students. For any adjustment in 
the present context, :a = 27 and N = 631. 

k = 1 for adjusting statistics corresponding to the 
within-group regressions (Table 11) , because there is one 
predictor (i.e., the protest, X). For adjusting statistics 
assorriate.l with the J-N regressions (Tables 12 and 13), 
k - 3 (i.e., T, X , and TX) . 

This ratio was used to adjust three statistics: the 
standard error for a regression coefficient (which was 
divi.;od by tho scjuare root of the adjustinq ratio) , the 
K ratio (multiplied by the adjusting ratio), and the ele- 
"^onts of tho varianco-covariance matrix for the J-:^ re- 
gression (divided by tlio adjustinq ratio). 
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The Proceduro for Constructinq a 
9jo Confidence Interval for the 
Ancova Estimate of the Treatir.ont Effect 
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Outline.! here is r.:.e procr.-duro for deriving a 95^. 
j^n:iu-:;jc i r: iv_ :cr the os:i,:.ate the t-^eat- 

niOTit of feet. (This :-')rocGdurG is discuL'Sc^d in greater detail 
i:i :<ogo3a [1980!^) The CTB5 total score is used an 
exarTip ] • * . 

Tho ancova estirato of the treatment efiect is oiMiiva- 

i^.-::t lo thi* diiicrence of thv two sar.iji'j wi tiiin-oroup 

r e.:rco:; : on liiu.^s eval;.ated at C , vliere C is the i^oi^t on X 

a a ^ — 

corrosLondi n^: to the wcicjhtGd average of the tv;o group ir;G an s. 

1'hus , trie ancova estinatc is the r)(X) at C or, equivaien tly , 

a 

1)(C ). A 100(1 - i) percent confidence interval for D (C ) 
a ' 'a 

:s bounded !)*/ the e:i>i[K.) intio 



a 



'-r.vre- s . , i tiv^ es t i :n.i t:* 'c; variance o: DiC ) . 
1 

.\; :eru:ix A :or :"ur't.[i*M' discussion <.'^i s*. , , . .) 

h^ (C 

a 

As r-s-^.r-.:d i\i T.ii.].^ 14, ! ' ) for t h" CThS total score 
:s . s*h . . - L.Oe,^ (.:,.■ A;.ssiii:-' A) -i sd V .] , - 4 . :i 8 . 

(A ci. ^: tr<''"d';:, v;as s s : t. r - 1 1 'i i 'hi-MistA' tor a t:c<r-.- 

t-,iu:!it. 'l.iS.S).) TfiMs , th'^ T'A confid^ssce int,cr\Ml for Ofr ) 
IS ,/ (4 .28) ( I . - -.HG^ i 

In vS Mi t / I f..(j tdi'' COM V* Si L I e^M 1 pf fjci^i.iurt ' for .;unstruct, - 
i:;m a :oi\! I'i- iic" interval for ts' ancova trcatme^it effect, 
t. !'.'.' procisisr*' < .in^d \)>- r*- do'-; rot r^^csirc thr^ <»r;sajiin)t i on 
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that >^ - 0. Cor.seqaently , v;hen the tett statistic for 
cno n--^. n\\^cthe5i5 .-.^ = 0 {i.e., '^^^44^ greacor rhan 
1, the latter procedure results in a narrower confidence 
interval . 
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